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HIGHLIGHTS 


► We use compressive sensing to study fusion method of infrared and visible images. 

^ This paper firstly proposes the fusion rule of maximum absolute of entry of sparse vector. 

^ The method using OMP provides better results in the condition of the same parameter setting, dictionary and fusion rule. 
^ The method IRdictionary_maxabsolute_OMP takes almost all the largest objective evaluations. 
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In order to obtain a more exact, reliable and better description than a single source image, we need to fuse 
source images taken from different sensors to a synthetic image. This paper employs infrared and visible 
images and uses the theory of compressive sensing to study image fusion method. The fusion method 
based on compressive sensing theory contains three parts: overcomplete dictionary, the algorithm of 
sparse vector approximation and fusion rule. This paper selects three trained overcomplete dictionaries 
by K-means Singular Value Decomposition (K-SVD) including the dictionary only using patches from the 
infrared images, the dictionary only using patches from the visible images and the dictionary using the 
combined patches, two sparse vector approximations containing orthogonal matching pursuit and poly¬ 
tope faces pursuit algorithms, and two fusion rules covering maximum ^-norm and maximum absolute 
of entry of sparse vector which is firstly proposed in this paper to study twelve fusion approaches. The 
experimental results show that the method using orthogonal matching pursuit can provide better fusion 
results in the condition of the same parameter setting and the same dictionary and fusion rule, and the 
method using the dictionary only using patches from the infrared images, the fusion rule of maximum 
absolute of entry of sparse vector and orthogonal matching pursuit takes almost all the largest objective 
evaluations and the best fusion quality. 

© 2012 Elsevier B.V. All rights reserved. 


1. Introduction 

In recent years, many images for one object or scene can be 
acquired by multiple sensors with the technology development 
of image sensors. In accordance with the natural properties of 
the sensors and the approach the images are obtained, all of these 
source images taken from the sensor directly contain unique infor¬ 
mation which is usually complementary. Taking infrared and visi¬ 
ble images studied in this paper for example, infrared images have 
lower contrast and definition compared with visible images, but 
visible images cannot capture targets effectively in low visibility 
conditions [1,2]. The fusion of infrared and visible images can 
obtain a more exact, reliable and better description than a single 
source image. Therefore, we can think that image fusion is the 
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foundation of the analysis of multisensor images [3]. In order to 
extend or enhance information about the scene, it is necessary to 
develop the technology of image fusion by combining the images 
captured by different sensors. Image fusion is the process of detect¬ 
ing salient features in the source images and fusing these details to 
a synthetic image. For the past few years, image fusion has been 
applied widely in diverse fields, such as target detection, intelligent 
surveillance, and nondestructive inspection [4-6]. 

All of image fusion methods developed or applied in the past 
two decades can be divided into three levels: pixel-level, feature- 
level, and decision-level in accordance with the stage where the 
information acquired by different image sensors is fused to a syn¬ 
thetic [7,8]. Pixel-level fusion method which combines the inde¬ 
pendent source images into a single image, reserves most of the 
information and is studied widely. Feature-level methods which 
typically use features of source images (such as edges or regions) 
to fuse them are usually robust to noise and misregistration. Since 
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decision-level methods combine image descriptions directly, the 
application field of them is limited greatly. The method proposed 
in this paper belongs to pixel-level fusion. The pixel-level 
approaches mainly contain two categories [9,10]: spatial domain- 
based methods and transformed domain-based methods. The 
simplest fusion method in spatial domain just takes the pixel- 
by-pixel average of the source images. Spatial domain-based meth¬ 
ods often lead to undesirable side effects, such as lower contrast, 
blockness in the fused image [11]. In the past decades, one of the 
most successful image fusion methods is by using multiscale trans¬ 
form [12]. As an outstanding transformed domain-based method, 
the image fusion method based on multiscale transform has three 
basic steps [13]: (1) decompose source images into multiscale rep¬ 
resentations with different resolutions and orientations; (2) 
decompose coefficients denoting multiscale representations and 
linked with the salient features of the original images are inte¬ 
grated according to fusion rules; (3) reconstruct the fused image 
by the inverse transform of the integrated multiscale coefficients. 

The earliest multiscale transform for image fusion is pyramid 
decomposition, for example, Laplacian pyramid [14], morphologi¬ 
cal pyramid [15], gradient pyramid [16]. Pyramid method firstly 
decomposes each source image into a series of images with differ¬ 
ent sizes (calls pyramid) in different resolutions, and then extracts 
the value in the pyramid with the highest saliency at each position 
in the decomposed images, finally reconstructs the fused image 
using the inverse transform of the composite images. Another mul¬ 
tiscale transform for image fusion is wavelet transform-based 
methods which use a similar scheme to the pyramid decomposi¬ 
tion, such as discrete wavelet transform [17,18], stationary wavelet 
transform [19], and dual-tree complex wavelet transform [20,21]. 
The principal shortcoming of these methods is that most of the 
multiscale transforms are not shift invariant, which is brought by 
the underlying down-sampling process. Recently, multiscale 
geometry analysis has been developed and used for image fusion 
to improve fusion results. Typical methods include ridgelet trans¬ 
form [22], curvelet transform [23]. However, although different 
wavelets can represent different image details, wavelets and re¬ 
lated multiscale transform cannot extract all of the underlying 
information of the source images effectively. The reason is that 
the dictionary constructed by different basis functions is limited. 
Decomposed coefficients of the fused image obtained by a limited 
dictionary in the transformed domain may cause all the pixel val¬ 
ues to change in the spatial domain. Consequently, in some cases 
multiscale transform-based fusion methods may produce undesir¬ 
able artifacts. 

Obviously, in order to make the fused image more accurate, it is 
necessary to explore a novel method to extract the underlying 
information of the source images more efficiently and completely. 
This paper proposes an image fusion method which is based on 
the recently developed theory of compressive sensing (CS). CS the¬ 
ory has been successfully applied in different fields of image pro¬ 
cess or computer vision [24-27], such as image denoising, image 
compression, feature extraction, and target classification. The core 
of CS is the sparse representation which describes natural signals 
including images by a sparse linear combination of columns of an 
overcomplete dictionary. Different from a limited dictionary within 
multiscale transformations, CS uses an overcomplete dictionary in 
which every column is also called a signal atom. Overcompleteness 
is the most prominent characteristic of the dictionary used in CS 
theory. Overcompleteness denotes that the number of signal atoms 
in the overcomplete dictionary is more than signal dimensions 
greatly and guarantees more meaningful and complete representa¬ 
tion of source signals than the traditional multiscale transforma¬ 
tions [28]. CS theory reveals the coefficients corresponding to the 
natural sign are sparse. Thus, Li and Yang employ the sparse coeffi¬ 
cient vectors to give a framework of CS-based image fusion [29]. 


According to this framework, our method for image fusion pro¬ 
posed in this paper contains following steps: Firstly, the overcom¬ 
plete dictionary is created and trained by K-means Singular Value 
Decomposition (K-SVD) [30]. Secondly, the source images are di¬ 
vided into patches by sliding window which is adopted to achieve 
better performance in capturing local salient features of source 
images and improve the image fusion quality. Thirdly, the patches 
are decomposed by the overcomplete dictionary into their corre¬ 
sponding sparse coefficients. Fourthly, tow fusion rules are em¬ 
ployed to combine the coefficients of the source images. Finally, 
the fused image is reconstructed using the combined coefficients. 

The rest of the paper is organized as follows: Section 2 presents 
the basic theory of CS and K-SVD for the overcomplete dictionary 
creation. In Section 3, the fusion scheme based on CS and the fusion 
rules based on sparse coefficients are discussed. Experiment and 
conclusion are demonstrated in Sections 4 and 5 respectively. 

2. Basic theory of compressive sensing 

In order to finish image fusion using CS-based method, a brief 
description of CS theory is necessary. CS theory has four essential 
factors. First, as mentioned above, overcompleteness is one of 
major characteristics of CS. Additional important conception is 
sparsity. Third, how to compute sparse coefficients denoting linear 
combinations of overcomplete dictionary is a problem. Lastly, cre¬ 
ating an overcomplete dictionary is the key step of using CS theory 
in practices. 

2.1. Overcomplete representation 

Suppose a given signal vector y e K n , and a collection of vectors 
(Pi e lZ n , i = 1,..., m, where m>n. Such collections are usually dic¬ 
tionaries and each vector cp f is an atom. Given signal y = ^Ta/cp,- 
(i ® 1,..., m) can be represented as a linear combination of atoms 
in the dictionary. Different from traditional multiscale transform 
basis representation, such linear combination of dictionary offers 
a wider range of generating atoms [31]. Thus, this dictionary is 
overcomplete and called overcomplete dictionary [32]. Overcom¬ 
plete representation based on such dictionary can allow more flex¬ 
ibility in signal representation and more effectiveness in signal 
process [33]. 

Considering the atoms as the columns of overcomplete dictio¬ 
nary O, overcomplete dictionary <I> = [q>i, q> 2 , ... , cp m ], so that 
the matrix <P e 7 Z nxm . Linear algebra tells us that a representation 
of the given signal y can be described as a coefficient vector 
a = [ai, a 2 , ... , a m ] T satisfying y = Oa. Since m > n, the problem of 
overcomplete representation is undetermined. That means there 
is no unique solution of the coefficients vector a. In order to obtain 
unique solution, considering the impact of sparsity constraint on 
this situation, CS theory can in certain circumstances generate a 
sparse coefficient vector as the linear combination of overcomplete 
dictionary. This coefficient vector is called sparse representation 
(shown in Fig. 1). 

2.2. Sparse representation and sparse vector approximation 

Generally speaking, the purpose of CS theory is to solve the 
problem of finding the sparsest representation possible in an over¬ 
complete dictionary. As a measure of sparsity of a vector a, £°-norm 
Hallo denotes the number of non-zero entries in a. The sparsest 
representation is the solution to the optimization problem [34]: 

min||a|| 0 s.t. y = Oa (1) 

In most practical situations, the above formula can be modified to 
include a noise allowance [33]: 
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Fig. 1 . Overcomplete dictionary and sparse representation. 


min a 


s.t. 


r - 0a|| 2 < d 


( 2 ) 


The above optimization is an NP-hard problem. In order to look¬ 
ing for the sparsest representation of the vector y, we have to enu¬ 
merate all possible combinations of columns of O Generally, such 
an algorithm has to cost at 0(2 m ) flops to carry out [33]. Thus, the 
approximations or relaxations of this optimization problem have 
to be researched to replace computation of analytical solutions. 

In the past decade, several algorithms have been proposed to 
solve this problem. The most typical methods contain a series of 
greedy algorithms and -norm minimization-based algorithms. 
As the simplest greedy algorithm, Matching Pursuit (MP) builds 
up fc-atom (/< ^ m) approximate representations a step in an itera¬ 
tion, adding to an existing (k-l)-atom approximation a new term 
chosen in a greedy fashion to minimize the resulting 1 2 -norm error 
[31]. When stopped after N iterations, one gets a sparse approxi¬ 
mate representation. Based on MP, Orthogonal MP (OMP) is further 
proposed to reduce iterations and reconstruction error, improve 
the robustness [35-37]. The basic idea of l^-norm minimization- 
based algorithm is to replace the 1 °-norm with a l 1 - norm . Recent 
development in the emerging theory of applied mathematics 
reveals that the solution of the 1 °-norm minimization problem is 
equal to the solution to the following l 1 - norm minimization prob¬ 
lem in under certain conditions [34]: 


min a 


s.t. 


y = d>a 


(3) 


Considering the noise allowances, the above formula is modi¬ 
fied to the following formula: 


min a 


s.t. 


(4) 


This V-norm minimization problem can be solved by standard 
linear programming methods In CS theory, this approach replacing 
the 1 °-norm with a l^-norm is called Basis pursuit (BP) [38,39] Be¬ 
cause the solution is known to be very sparse, even more efficient 
methods based on BP algorithm can be available, such as Polytope 
Faces Pursuit (PFP) algorithm [40]. 

2.3. Creation of overcomplete dictionary 

For image fusion, given signal y can be obtained directly from 
source images, but the overcomplete dictionary O cannot. There¬ 
fore, the technique of designing dictionary to better fit sparse rep¬ 
resentation has been considered and discussed. At present, these 
techniques mainly divided into two categories: the first category 
is that the dictionary is selected from overcomplete transform 
bases or be created as a hybrid of several multiscale transform 
bases. The typical examples of this category include overcomplete 
Discrete Cosine Transform (DCT) dictionary consisting of DCT bases 
and the hybrid dictionary consisting of DCT bases, wavelet bases, 
ridgelet bases and so on [29]. The other category is a trained 


dictionary obtained from learning train samples. K-SVD present 
in paper [30] belongs to the second category. Recent studies reveal 
that the dictionary trained by K-SVD outperforms the dictionary of 
the first category in the field of image process, such as image 
reconstruction and image denoising [24]. 

The idea of K-SVD comes from K-means for Vector Quantization 
(VQ). Typically, the K-means algorithm is used to train the dictio¬ 
nary of VQ codewords. Suppose codebook matrix C = [ci, c 2 , ... , 
C/c], c i is the codeword of codebook(dictionary), when C is given, 
each signal y, is represented by a vector as its closest codeword un¬ 
der f-norm distance. That means y, = Ca„ where a, = e, is a vector 
from the trivial basis with all zero entries except a one in the jth 
position [41]. The K-means can be considered as an extreme case 
of sparse representation in the sense that only one atom (code¬ 
word) is allowed to participate in the linear combination of y,. 
The representation Mean Square Error (MSE) of y* is defined as 
||y f - Ca x - 11 2 , and for a wide family of signals Y = {y z } ^(N >> K), 
the overall MSE is ||Y — CA||p where A={a,-}^ 1 . Therefore, the 
objective function of K-means is as follows: 


min c , A ||Y - CA||j: 


s.t 


Vi, a/ = e,< for some k 


( 5 ) 


The K-means algorithm is an iterative method. Each iteration 
contains two stages, one is computing A and the other is updating 
the codebook. In K-means algorithm, each given signal must be 
represented by only one atom (codeword). When a few atoms of 
the dictionary can be allowed to represent a given signal, K-means 
algorithm transform to K-SVD algorithm. In other words, the 
coefficient vector denoting a linear combination of a given signal 
is allowed more than one non-zero entry in K-SVD, and K-SVD 
can be viewed as a generalization of K-means. For this case, the 
optimization problem corresponding to Eq. (5) is that of searching 
the best possible dictionary D for the sparse representation of the 
given signals set Y 


min DA ||Y - DA||J: 


s.t. 


Vi, 


, ^ T 0 


( 6 ) 


where the dictionary D is similar the codebook C of K-means algo¬ 
rithm, T 0 is a threshold denoting the allowed maximization number 
of non-zero entries of the coefficient vector. 

K-SVD algorithm is used to obtains optimal solution of Eq. (6) 
by iterative approach. One iterative process can be divided into 
two stages, sparse representation computation stage and dictio¬ 
nary update stage. In the first stage, the algorithm fixes D and uses 
the algorithm of sparse vector approximation to obtain the coeffi¬ 
cient matrix. The purpose of the second stage is to search for a bet¬ 
ter dictionary. The process updates only one column of D at a time. 
That means fixing all columns of D except one d k and finding a new 
atom d k and new values for its coefficients that best reduce the 
MSE. Singular Value Decomposition (SVD) is employed to decom¬ 
pose the overall representation error matrix E k which stands for 
the error for all N columns vector of Y when the kth atom is re¬ 
moved and E fc = Y — ^Td/a^, j = 1, 2, ... , k - 1, k + 1, ... , N, a^ de¬ 
notes the jth row in coefficient matrix A that correspond to one 
column dj in the dictionary. The stopping condition of iteration 
can be set to the given iteration number. 

In this paper, we select three different trained dictionaries to 
study image fusion: (1) the dictionary by only training patches 
from the infrared image; (2) the dictionary by only training patches 
from the visible images; (3) the dictionary by training on a corpus 
of patches taken from infrared and visible images. 

3. Image fusion algorithm based on CS theory 

Generally speaking, source images used for fusion must be reg¬ 
istered. Recently, many effective approaches for image registration 
have been proposed. The source images used in this paper from 
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Fig. 2. Framework of image fusion based on CS theory. 


www.imagefusion.org have been geometrically registered. The 
framework of image fusion based on CS theory is summarized in 
Fig. 2. The method mainly contains three stages: the first stage is 
to obtain given signal set Y from source images using sliding win¬ 
dow technique and image patch vectorization. In the second stage, 
sparse coefficient matrixes of different source images I 1 " and I VI are 
computed by approximations including OMP and \^-norm minimi¬ 
zation-based algorithm. Thirdly, fused sparse coefficient matrix A F 
is combined using coefficients fusion rule. Lastly, construct fused 
image I F in accordance with fused sparse coefficient matrix A F . 

3.1. Obtain given signal set Y 

It is obvious that a signal set Y = {y,-}^ comes from source 
images and a source image corresponds to one given signal set Y. 


Because image fusion generally depends on the local information 
of source images, sparse coefficient matrix representing the entire 
image cannot directly be used with image fusion. In order to solve 
this problem, Li proposed a method that divides the source images 
into small patches and makes the sparse representation shift 
invariant by a sliding window technique [9,29]. Sliding window 
technique and the method to obtain the given signal set Y are 
shown in Fig. 3. Firstly, divide source image I of size M x N from 
left-top to right-bottom into every possible image patches of size 
n x n, where n is the length of atom in the dictionary and n < M, 
n < N, and then each image patch is vectorized by row-to-column 
order to the column y* of Y. when the sizes of the source image 
and the image patch are M x N and n x n respectively, the size of 
given signal set Y is n 2 x (M - n + 1)(N - n + 1). The size of image 
patch (sliding window) plays an important role in CS-based image 
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Fig. 3. Process of obtaining given signal set Y. 
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fusion [9]. The larger size patches led to the length of the atom of 
the corresponding dictionary increases and the calculation speed of 
sparse vectors reduces. As the size of the patch decreases, the pro¬ 
cess of sparse vectors computation becomes faster, but the infor¬ 
mation content contained in the patches would be lower and it 
may miss some of the important features of the source images. 
In this paper, the size of the image patch n is equal to 8 which 
has been proved to be appropriate setting for the image denoising 
and fusion [9,24]. 

3.2. Fusion scheme 

We assume that the source images I m and I VI taken from infrared 
and visual sensor respectively. As mentioned earlier, given signal 
sets Y in corresponding to l m and Y v ' corresponding to I VI are ob¬ 
tained by using the slide window technique. In matrix Y in or Y VI , 
each column yj n or yf corresponds to one patch of the source im¬ 
age r or I VI , aj n and a f are the sparse representation of each col¬ 
umn yj n and yf respectively. The process of image fusion based 
on sparse representation matrix is as follows: 

• Step. 1. According to the sparse coefficient fusion rule, corre¬ 
sponding columns of sparse representation matrixes A' n and 
A v ' of the source images are fused to generate fused sparse rep¬ 
resentation matrix A F ; 

• Step. 2. The vector representation of the fused image Y F can be 
obtained by Y F = <PA F ; 

• Step 3. As the inverse process of obtaining given signal set, 
reshape each column yf of Y F into a block with the size n x n 
and then add the block to the null matrix S with the size of 
MxN; 

• Step 4. According to Fig. 3, in each pixel position of S, the pixel 
value is the sum of several block values, thus, the pixel value of 
S is divided by the adding times at its position to obtain the cor¬ 
responding pixel value of fused image I F . The adding times in 
each pixel position of S are indicated as follows: 
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The coefficient fusion rule is usually determined by the activity 
level of the atom which can be described by the absolute value of 
the corresponding coefficient in the sparse representation vectors 
[10,13]. Several algorithms employ averaging a/ or absolute maxi¬ 
mum 11a z -11) to express the activity level. The averaging fusion rule 
denotes that the corresponding coefficients are averaged to obtain 
fused sparse vector. This rule can retain the contrast information of 
the source images, but the high-frequency features denoting de¬ 
tails would get smoother. Generally, fusion algorithm always 
hopes to integrate the information of the source images into fused 
image as more as possible. This paper studies two rules to combine 
the coefficient vectors. The first is called maximum-f -norm fusion 
rule, which obtains the fused coefficient vector by selecting the 
coefficient vector with maximum ^-norm [29]. Because the maxi¬ 
mum coefficient absolution of the sparse vector denotes the most 
important information of source images, this paper proposes the 
second fusion rule calls absolute of entry maximum ( maxabsolute ) 
fusion rule, which obtains the fused coefficient vector by choosing 
the coefficient vector with maximum entry absolute of the sparse 
vector and is calculated by 

aF = f aj" if max (|aj"|) > max (af|) 

! [ af if max(|aj n |) ^max(af'l) 

where aj n and af are the sparse representation, | | is the absolute of 
each entry of vector, max (•) is equal to the maximum entry of the 
vector. 

4. Experimental verification and discussion 

4.1. Experiment setup 

This section mainly demonstrates the fused images and com¬ 
pares objective evaluation measures of fusion results in different 
overcomplete dictionaries, different fusion rules and different 
sparse vector approximations. 
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(a) trained dictionary only using (b) trained dictionary only using (c) hybrid dictionary with a corpus of 
patches from the infrared image patches from the visible images patches from infrared and visible 


Fig. 4. Three different overcomplete dictionaries. 
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The overcomplete dictionaries used in this paper are trained by 
the K-SVD algorithm. Training samples consist of 20000 patches of 
size 8x8 which randomly take from different images including 
infrared and visible images. According to the proportion of differ¬ 
ent types of training patches, three kinds of overcomplete dictio¬ 
naries are used to test and evaluate the performance of the 
proposed methods. Fig. 4a-c respectively show the trained dictio¬ 
naries only using patches from the infrared images ( IRdictionary ), 
only using patches from the visible images ( VIdictionary ) and using 
the combined patches consisting of half from infrared and half 
from visible images ( HYdictionary ). In Fig. 4, K of K-SVD algorithm 
is set to 256 and iteration number of the dictionary training is 
equal tolOO. 

Because source images are acquired by various sensors, and for 
different types of source images, the performances of fused images 
using a fusion algorithm are not identical, thus, it is a difficult task 
to evaluate fused image quality and there are several different 
criterions according to different purposes. The fusion evaluation 
criterions contain objective and subjective measures. The subjec¬ 
tive measure is to observe fused images with the eye. An objective 
fusion measure should extract all the perceptually important infor¬ 
mation that exists in the input images and measure the ability of 
the fusion process to transfer as accurately as possible this infor¬ 
mation into the fused image. For the fusion of infrared and visible 
images discussed in this paper, we select three objective evaluation 
measures which have been proved to be validated in large degree 
and considered to quantitatively evaluate the fusion performances 
[42], including MI (mutual information), Qw, and Q^ B,F . The larger 
the value of MI indicates better fusion performance. For the out¬ 
standing fusion result, Q? BIF should be as close to 1 as possible. 
The larger denotes the better fused results. 

As mentioned earlier, according to the selection of three differ¬ 
ent overcomplete dictionaries, two fusion rules of coefficient ma¬ 
trix and two sparse vector approximations, the methods 
discussed and compared in this paper contains: 

(1) IRdictionaryjnaxVnomi-OMP: This method employs 
trained dictionary only using patches from the infrared 
image, P-norm maximum fusion rule, and OMP sparse vec¬ 
tor approximation. 


(2) IRdictionary_maxl 1 norm_PFP: This method employs 
trained dictionary only using patches from the infrared 
image, P-norm maximum fusion rule, and PFP sparse vector 
approximation. 

(3) VIdictionaryjnaxVnomi-OMP: This method employs 
trained dictionary only using patches from the visible image, 
P-norm maximum fusion rule, and OMP sparse vector 
approximation. 

(4) VIdictionary_maxl 1 norm_PFP: This method employs 
trained dictionary only using patches from the visible image, 
P-norm maximum fusion rule, and PFP sparse vector 
approximation. 

(5) HYdictionaryjnaxfnomi-OMP: This method employs 
trained dictionary using patches from the hybrid image, 
P-norm maximum fusion rule, and OMP sparse vector 
approximation. 

(6) HYdictionary_maxl 1 norm_PFP: This method employs 
trained dictionary using patches from the hybrid image, 
P-norm maximum fusion rule, and BP sparse vector 
approximation. 

(7) IRdictionary_maxabsolute_OMP: This method employs 
trained dictionary only using patches from the infrared 
image, absolute of entry maximum fusion rule, and OMP 
sparse vector approximation. 

(8) IRdictionary_maxabsolute_PFP : This method employs 
trained dictionary only using patches from the infrared 
image, absolute of entry maximum fusion rule, and PFP 
sparse vector approximation. 

(9) VIdictionary_maxabsolute_OMP: This method employs 
trained dictionary only using patches from the visible image, 
absolute of entry maximum fusion rule, and OMP sparse 
vector approximation. 

(10) VIdictionary_maxabsolute_PFP: This method employs 
trained dictionary only using patches from the visible image, 
absolute of entry maximum fusion rule, and PFP sparse vec¬ 
tor approximation. 

(11) HYdictionary_maxabsolute_OMP: This method employs 
trained dictionary using patches from the hybrid image, 
absolute of entry maximum fusion rule, and OMP sparse 
vector approximation. 



(b) visible source image 1 (d) visible source image 2 (f) visible source image 3 


Fig. 5. Source images. 
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(a) IRdictionary_max/ l norm_OMP (b) VIdictionary_ max/ 1 norm_OMP (c) HYdictionary_ max/'norm^MP 



(d) IRdictionary_max/ 1 norm_PFP (e) YIdictionary_max/ 1 norm_PFP (f) HYdictionary_max/ 1 norm_PFP 



(g) IRdictionary_maxabsolute_OMP (h) VIdictionary_maxabsolute_OMP (i) HYdictionary_maxabsolute_OMP 



(j) IRdictionary_maxabsolute_PFP (k) VIdictionary_maxabsolute_PFP (1) HYdictionary_maxabsolute_PFP 


Fig. 6. Fused images of source images 1 using different approaches. 


(12) HYdictionary_maxabsolute_PFP : This method employs 
trained dictionary using patches from the hybrid image, 
absolute of entry maximum fusion rule, and PFP sparse vec¬ 
tor approximation. 

In the experiment, three pairs of infrared and visible source 
images, shown in Fig. 6, are used to test 12 methods. In infrared 
source images, such as Fig. 5a, c, and e, the person can be observed 
clearly, while in visible source images, such as Fig. 5b, d, and f, the 
background including trees, roads and barriers is clear. Fused algo¬ 
rithm needs to show the person and background clearly in a fused 
image. 

4.2. Experiment results and performance evaluation 

The fused images of Fig. 5a and b with 12 different fusion ap¬ 
proaches are shown in Fig. 6a~l. Fig. 7a~l and Fig. 8a~l demon¬ 
strate the fused results of source image pair 2 and 3 of Fig. 5 
respectively. In these experiments, we fix the global error S in 


Eqs. (2) and (4) used in OMP and PFP algorithm to 1, the size of 
all overcomplete dictionaries shown in Fig. 4 is 64 x 256. Figs. 6- 
8 visually show that the visual quality of the first and third rows 
(a, b, c, g, h, and i) using OMP sparse vector approximation is better 
than the second and fourth rows (d, e, f, j, k, and 1) using PFP 
approximation in the same parameter setting. For example, the 
fused images of the infrared and visible images in the second and 
fourth rows of Figs. 6-8 are blurred and the details and features 
of the source images cannot be reserved availably. On the contrary, 
the fused images of the first and third rows of Figs. 6-8 faithfully 
reserve important detailed information of source images. In these 
images the person are clearly enhanced and some other useful 
background information is almost preserved. From Figs. 6-8, we 
can see that the methods using OMP approximation can provide 
better visual fusion results in the condition of same parameter 
setting. 

The objective evaluations containing MI, Qw, and Qf B,F on the 
fused images of Figs. 6-8 are listed in Table 1-3 respectively. 
According to these tables, we also find that the objective 
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evaluations of all methods using OMP approximation (a, b, c, g, h, 
and i columns) are obviously better than all algorithms using PFP 
approximation (d, e, f, j, k, and 1 columns). The method Vldiction- 
aiy_maxllnorm_PFP corresponding to e column of Table 1-3 is 
the best in all methods using PFP approximation. Thus, in the next 
experiment, we employ this method to discuss the relationship be¬ 
tween the global error 3 in Eq. (4) and the fusion quality. The meth¬ 
od IRdicdonary_maxabsolute_OMP corresponding to g column of 
Table 1-3 takes almost all the largest objective evaluations which 
indicated in bold. Therefore, in the following experiment, we select 
this method to study the relationship between the objective eval¬ 
uations and the length of trained overcomplete dictionary (the va¬ 
lue of I <) and iterations of dictionary training. From Table 1-3, we 
also can find that the different fusion rules have no effect on the 


objective evaluations (d-f columns to j—1 columns) in the different 
dictionaries using PFP approximation. 

The global error 3 in Eq. (4) has an important effect on the speed 
of PFP algorithm computation. As the value of 8 decreases, the pro¬ 
cess of the PFP becomes slower quickly. However, careful inspec¬ 
tion of Fig. 9 reveals that visual quality of the fused image has 
been improved obviously as the value of 3 decreases. The objective 
evaluations of Fig. 9 shown in Table 4 can also verify this conclu¬ 
sion. Compared with the best objective evaluations of Table 3, 
the best values of Table 4 indicated in bold are not perfect. There¬ 
fore, in general the performance of OMP is better than PFP in image 
fusion. 

The number of atoms of overcomplete dictionary I< and the iter¬ 
ation for dictionary training are two important parameters. The 






Fig. 7. Fused images of source images 2 using different approaches. 
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(a) IRdictionary_max/' norm_OMP 



(d) IRdictionary_max/ 1 norm_PFP 



(g) IRdictionary_maxabsolute_OMP 



(j) IRdictionary_maxabsolute_PFP 



(b) VIdictionary_ max/ 1 norm_OMP 



/ 


(e) VIdictionary_max/ 1 norm_PFP 



(h) VIdictionary_maxabsolute_OMP 


(f) HYdictionary_max/ 1 norm_PFP 



(i) HYdictionary_maxabsolute_OMP 



(k) VIdictionary_maxabsolute_PFP (1) HYdictionary_maxabsolute_PFP 


Fig. 8. Fused images of source images 3 using different approaches. 


Table 1 

Quantitative assessment of various fusion approaches for Fig. 6. 


Fig-6 

a 

b 

C 

d 

e 

f 

g 

h 

i 

j 

k 

1 

MI 

1.6001 

1.5751 

1.6518 

1.4387 

1.5551 

1.4984 

2.3624 

2.1504 

2.3528 

1.4387 

1.5551 

1.4984 

Qw 

0.5724 

0.5738 

0.5621 

0.2512 

0.3262 

0.2133 

0.5616 

0.5635 

0.5473 

0.2512 

0.3262 

0.2133 

qAB/F 

0.5754 

0.5638 

0.5689 

0.2769 

0.3084 

0.2617 

0.6218 

0.6048 

0.6270 

0.2769 

0.3084 

0.2617 


number of atoms of dictionaries shown in Fig. 4 is equal to 256. 
Fig. 10 demonstrates two dictionaries of different atom numbers 
( I< =128 and I< = 384), and fused images using those two dictionar¬ 
ies and the method IRdicdonary_maxabsolute_OMP. Since I< is equal 
to the length of the sparse vector, the large I< results in burden¬ 


some calculation. Compared with quantitative assessment of fused 
image using I< = 256 dictionary, different values of I< have no obvi¬ 
ous effect on the objective evaluations. Therefore, we can select the 
value of I< only according to the criterion of I< ^ n 2 , n x n is the size 
of image patch used for dictionary training. In addition, Fig. 11 
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Table 2 

Quantitative assessment of various fusion approaches for Fig. 7. 


Fig.7 

a 

b 

C 

d 

e 

f 

g 

h 

i 

j 

k 

1 

MI 

1.4189 

1.3258 

1.6679 

2.0942 

2.0869 

2.0010 

3.2490 

2.9635 

3.1216 

2.0942 

2.0869 

2.0010 

Qw 

0.3426 

0.3447 

0.3541 

0.1928 

0.2303 

0.1758 

0.4650 

0.4586 

0.4600 

0.1928 

0.2303 

0.1758 

qAB/F 

0.4450 

0.4347 

0.4694 

0.2566 

0.2869 

0.2423 

0.5896 

0.5735 

0.5828 

0.2566 

0.2869 

0.2423 


Table 3 

Quantitative assessment of various fusion approaches for Fig. 8. 


Fig.8 

a 

b 

C 

d 

e 

f 

g 

h 

i 

j 

k 

1 

MI 

1.7893 

2.4241 

1.9725 

2.0942 

2.0869 

2.0010 

2.7291 

1.7589 

2.8064 

2.0942 

2.0869 

2.0010 

Qw 

0.4822 

0.4936 

0.4820 

0.2702 

0.2964 

0.1699 

0.4908 

0.4830 

0.4793 

0.2702 

0.2964 

0.1699 

qAB/F 

0.4929 

0.4939 

0.4965 

0.1952 

0.2044 

0.1296 

0.5154 

0.4771 

0.5255 

0.1952 

0.2044 

0.1296 



8 = 0.0001 8 = 0.00001 
Fig. 9. Fused images according to different S using BP algorithm. 


Table 4 

Quantitative assessment of various fusion approaches for Fig. 9. 


5 

0.1 

0.01 

0.001 

0.0001 

0.00001 

MI 

2.1036 

2.1711 

2.1941 

1.7647 

1.6831 

Qw 

0.3214 

0.3731 

0.4371 

0.5028 

0.4935 

qABIF 

0.2299 

0.3245 

0.432 

0.4518 

0.4760 


shows the overcomplete dictionaries of different iterations (200 
and 300), fused images using these two dictionaries and the meth¬ 
od IRdicdonary_maxabsolute_OMP and correspondence objective 
evaluations. From Fig. 11 and compared with column g of Table 
3, we can see that the objective evaluations have no obvious 
change when the iteration for dictionary training increases. 

5. Conclusion 

In this paper, we study 12 different approaches of image fusion 
based on CS theory according to different trained overcomplete 


dictionaries including the dictionary only using patches from the 
infrared images, the dictionary only using patches from the visible 
images and the dictionary using the combined patches, different 
sparse vector approximations containing OMP and PFP algorithms, 
and different fusion rules covering maximum P-norm and maxi¬ 
mum absolute of entry of sparse vector which is firstly proposed 
by this paper. Three pairs of source images consisting of visible 
and infrared images are used to test the performance of the differ¬ 
ent methods. The objective evaluations containing MI, Qw, and 
q^b/f are usec j t0 q Uan titatively evaluate the fusion performances. 
The experimental results of these 12 methods can be concluded 
into the following aspects: (1) The method using OMP approxima¬ 
tion can provide better fusion results in the condition of the same 
parameter setting. (2) The method IRdictionary_maxabsolute_OMP 
takes almost all the largest objective evaluations. (3) Using PFP 
approximation and the same dictionaries, the objective evaluations 
of fused images based on different fusion rules are identical. (4) As 
the value of 5 decreases, the visual quality of the fused image using 
PFP approximation has been improved obviously. (5) Atom number 
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Dictionary (K= 128) 


Dictionary (K=384) 



-CL 



fused image 


-CL 



fused image 



K=128 K=256 K=384 

quantitative assessment of different K 


Fig. 10. Fused images and objective evaluations using dictionaries of different atom numbers. 



Fig. 11. Fused images and objective evaluations using two dictionaries of different iteration. 


I< contained in trained dictionary has no obvious effect on the 
objective evaluations. (6) The objective evaluations have no obvi¬ 
ous change when the iteration for dictionary training increases. 
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